allowed us to tvpe unknown irjjtduals as DRw52a or DRw52b 
(ref 22 and unpublished) in l^^able number of normal and 
diseased individuals. This form of analysis can be of consider- 
able use in phylogenetic studies of human populations. 

Taken together, these results can account for the evolution 
of the DR genes in the DRw52 supertypic group (Fig. 4). The 
ancestral features of this family are the relatively recent duplica- 
tion of the 0/ locus and the silencing of the l31I locus by deletion 
of the first domain encoding exon 11 . The duplicated DRp! loci 
then diverged into 0/ and /3///. Further divergence resulted in 
a branching into two lineages {DRw52a and DRw52b) based 
on common alleles at the less polymorphic locus, DRpIII. In 
the DRw52a group, the DRwda haplotype gave rise to the DR3 
specificity by the gene conversion described here. The DRw6b 
haplotype was probably involved in an interchromosomal gene 
conversion with the DR4 pill locus acting as donor. 

This analysis provides a framework for assigning serological 
specificities to the products of the different loci of the DRw52 
haplotypes. Allelic differences in the product of the DRplll 
locus split DRw52 into a and b. It has already been shown that 
this locus encodes the DRw52 specificity for the case of the 
DRwdb haplotype 23 . We propose that the distinct epitopes 
DRw52a and DRw52b (Fig. 4) will correspond to serological 
and T-cell specificities. In addition to the product of locus 
DR&1IL each haplotype obviously aiso expresses tne product 
of their 0/ locus, which determines the fine DR specificity. 

The data described here represent an example of relatively 
rapid evolution of a multigene family in which the loci appear 
to diverge at different rates following a duplication event. The 
time of divergence may be estimated by analysing this group of 
haplotypes in other geographical (non-European) groups whose 
'i iu ■ " ill t c " " - I . r. n 

The divergence in this gene family is generated in part by 
gene conversion. Since this mechanism can involve the transfer 
of preselected epitopes, the resulting additional polymorphism 
is frequently maintained in the population. Therefore, even if 
gene conversion is a relatively rare event, it can play a major 
role in the generation of polymorphism by producing func- 
tionally effective variants. It is generally thought that this poly- 
morphism confers a selective advantage to a population in terms 
of its abilitv to cooe with various pathogens. A genetic system 
with multiple loci undergoing conversion events could regener- 
ate polymorphism in populations which have undergone bottle- 
necks due to migration or adverse environmental factors. 

We thank Chantal Mattmann and Madeleine Zufferey for 
technical assistance, Dr B. Schwartz for providing sequence data 
for DR5 prior to publication and Dr J. J. van Rood for comments 
on the manuscript. This work was supported by a grant from 
the Swiss National Fund for Scientific Research. 

Note added in proof: From a recent analysis of micropoly- 
morphism of HLA-DR4 (ref. 26), we propose that one DR4fil 
allele, DwlO, has arisen by a gene conversion, with DRw6a@I 
acting as donor. 
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Hepatitis B virus DNA integration in 
a sequence homologous to 
y-erb-A and steroid receptor 
genes in a hepatocellular carcinoma 
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Hepatitis B virus (HBV) is clearly involved in the aetiology tf 
human hepatocellular carcinoma (HCC) 1 and the finding of HBV 
DNA integration into human liver DNA in almost all HCCil 
studied* -7 suggested that these integrated viral sequences may* I 
involved in liver oncogenesis. Several HBV integrations in differ* 
HCCs* 9 and HCC-derived cell lines 10 " 14 have been analysed tfta| 
molecular cloning without revealing any obvious role for HBV. 
From a comparison ot a HBV tuU£i»iioa sue pre^cui ia * 
ticular HCC? with the. corresponding unoccupied site in the J 
tumorous tissue of the same liver, we now report that HBV integn> I 
tioo places the viral sequence next to a liver cell sequence whidl 
bears a striking resemblance to both an oncogene (v-*r6-A) u< 
the supposed DNA-binding domain of the human glucocorticoid 
receptor and human oestrogen receptor genes. We suggest tfc* I 
this gene, usually silent or transcribed at a very low level in noma 1 
hepatocytes, becomes inappropriately expressed as a consequent 
if HPV integration, thus contributing to tb? ♦-?ncfnrmitioi 
We have previously reported the molecular cloning of the 
single integrated viral sequence present in the liver tumoroo 
nodule of patient D and we have determined the sequences of 
the cellular-viral junctions 8 . The viral insertion was a continuoo 
subgenomic fragment 1.4 kilobases (kb) long (FigMa) contain 
ing the cohesive-end region, gene C and the beginning of gcae 
pre-Sl We therefore used the 1.1-kb and 5.8-kb Hmdlll cellular 
fragments and the 1.8-kb EcoRI host-viral fragment (respec 
lively referred to as LT (left tumour), RT (right tumour) and , 
MT (medium tumour); Fig. la) to isolate the unoccupied site 
from a A phage library of DNA extracted from the non-tumorous 
liver of patient D. This part of the liver did not seem to contain | 
any integrated HBV sequences. Seven overlapping clones, 
hybridizing to one and/or the other of the three probes, wert 
isolated and represented 32 kb of cellular DNA at the urn* | 
cupied site (Fig. 16). Southern blots of restriction digests of the 
seven clones using total human DNA as a probe showed tint 
the host sequence at the viral insertion site corresponds mainlj 
to unique sequence DNA (Fig. la, 6, solid bars). Comparisoo 
between the restriction maps of the unoccupied site (Fig.lM 
and the integrated site (Fig. la) did not reveal any major 
genomic rearrangements in the cellular DNA. Integration took 
place within a small EcoRl fragment of 400 base pairs (bp) ] 
which we subcloned and refer to as MNT (medium non-tumoufi 

( Fi 8' lb) - J . u • • •„ 

To investigate whether HBV became integrated in the vicintrj 

of a cellular gene in the human genome, we determined the 

nucleotide sequence' 5 of the normal allele. This sequence 
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Fig. 1 «. Restriction map of the occupied siie. This has previously 
been isolated from a library of cellular DNA extracted from the 
tumorous part of the liver of patient D 8 . The stippled region 
represents the 1.4-kb integrated HBV sequence. The arrowhead 
denotes the same orientation as the viral ( + Jstrand 30 . Solid bars 
denote unique sequences and open bars regions containing repeti- 
tive cellular sequences. Restriction sites are: R, fcoRI; B, BamH\\ 
Bg, BgHI; H, Hmdlll; X, Xho I. The cloned U-kb Hindll!, 1.8-kb 
£«RI and 5.8-kb Hmdlll cellular fragments are referred to as 
LT MT and RT. b % Restriction map of the unoccupied site. The 
cloning experiment in which partial Mbo\ digests of cellular DNA 
were cloned into A L47.1 was as described previously". A library, 
made with cellular DNA from the non-tumorous part of the liver 
of patient D, was screened with LT, MT and RT as probes. Seven 
positive overlapping clones were analysed, representing 32 kb of 
cellular DNA. The cloned 400-bp fcoRI cellular fragment is 
referred to as MNT. The site of HBV integration within MNT is 
indicated by an arrow. 

revealed the presence of an open reading frame (ORF) of 519 
nucleotides which was interrupted in the middle by the viral 
insertion (Fig. 2). Since methionine codons were present only 
at the 3' end of the ORF, this region can only correspond to a 
single exon from a split gene. Two acceptor-like (A-Ll, A-L2) 
ind two donor-like (D-Ll, D-L2) consensus sequences of splice 
junctions 16 could be localized, respectively, at the 5' end and 
the 3' end of the ORF (Fig. 2). A computer-assisted analysis 17 
ihowed that the region between A-L2 and D-Ll had a strong 
probability of being an exon. A search for related sequences in 
the NBRF protein data bank 18 identified a remarkable homology 
»?2 identities out of 49) between the translation product of the 
"on like region of the ORF (amino-acid residues 69-117) and 
ihe amino-terminal region of the v-erb-A oncogene product 
(residues 8-58) 19 (Fig. 3). Moreover, a significant homology has 
recently been reported between the v-erb-A protein and either 
ihe human glucocorticoid receptor (hGR) 20 or the human oes- 
trogen receptor (ER) 21 . The alignment of the amino-acid 
*qucnce of hGR and ER with the exon-like protein product 
»fco*cd, in both cases, 19 identities out of 49 (Fig. 3). As already 
<**rved for v-erb-A, hGR 20 and ER 2l t the identity became 
Pcater beyond amino acid 96 of the ORF, which corresponds 
l0 *cysteine-rich region. In particular, the four cysteines conser- 
ved between v-ero-A, hGR and ER are present at the same 
P°*'tion in the ORF. Although the homology continues for 
♦■"M, hGR and ER— all derived from cDNA clones— no 
^nificant homology was found beyond residue 117 for the 

RF However, since the D-Ll sequence (next to residue 117) 
11 Mrictly identical to the consensus sequence of a donor site, 
rtc ould thus correspond to an exon/intron boundary. 

The 



nii. v 
th« 



that the 



integration of HBV sequences interrupted the cellular 
d generated a microdeletion of 7-12 bp 
minor rearrangement provides evidence 



I r «ding frame an 
> '*«ed in Rg. 4). This 



situation we are studying in patient D is probably very 
^ the initial integration event. In addition to the microdele- 
the viral integration — interrupting the cellular ORF — 
^ era| ed a new viral-host hybrid sequence such that the first 
»ith°K° nS ° f lhc viral P re - Ss 8 cnc became fused and in phase 
n the last 28 codons of the cellular exon ( Fig. 4). Remarkably, 
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Fig. 2 Nucleotide sequence of the unoccupied site. Nucleotides 
are numbered at the left side. The deduced amino-acid sequence 
of the 519-bp open reading frame is shown above the nucleotide 
sequence. The amino-acid sequence is numbered from the first 
codon of the ORF. A large number of splice junction sequences 
have been reported 16 . The compilation of the data supports the 
consensus J^N^AG/G for acceptors and the consensus 
SAG/GTcAGT for donors. The two acceptor-like ( A-Ll and A-L2) 
and donor-like (D-Ll and D-L2) sequences are underlined. The 
site of HBV integration in the middle of ORF is indicated by an 
arrow. The cloned 3.4-kb Hmdlll fragment, encompassing the 
unintegrated site in the normal allele, was sonicated, treated with-" 
the KJenow fragment of DNA polymerase plus deoxyribonucleo- 
tides (2h, 15°C) and fractionated by agarose gel electrophoresis. 
Fragments of 400-700 bp were excised and electrocuted. DNA 
was ethanol-precipitated, ligated to dephosphorylated Smal- 
cleaved M13 mp8 replication form DNA and transfected into 
Escherichia colt strain TG-1 by the high-efficiency technique of 
Hanahan 31 . Recombinant clones were detected by plaque hybridiz- 
ation using the MNT (Fig. 16) subclone DNA as a probe. Single- 
stranded templates were prepared from plaques exhibiting positive 
hybridization signals and were sequenced by the dideoxy chain 
termination procedure 15 using buffer gradient gels . 

the viral genome became integrated a few nucleotides upstream 
from the most conserved Cys-rich portion of the ORF (Fig. 3), 
maintaining the integrity of this region. 

Using a panel of 17 mouse-human and Chinese hamster- 
human somatic cell hybrid DNAs 22,23 , we localized the ORF to 
chromosome 3 (data not shown), while the c-erb-A oncogene' 4 , 
hGR" and ER 26 have been mapped, respectively, to human 
chromosomes 17, 5 and 6. In preliminary experiments, we 
hybridized the MNT probe, encompassing the exon-like region 
of ORF, to a Northern blot of polyadenylated RNAs extracted 
from five human livers, but found no detectable transcripts. A 
large number of human fetal and adult tissues will have to be 
tested similarly to reveal any active transcription of this region. 

The conserved Cys-rich region which extends over 60 amino- 
acid residues in s-erb-A protein, hGR and ER is thought to 
include the DNA-binding domain of the molecule' 0 21 . We can 
thus speculate that the corresponding homologous region of 
ORF, truncated by the exon-intron boundary, is pan of a cellular 
gene that shares a common functional domain with hGR, ER 
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i-ig. J Amino-acid sequence ^^Knent of the 

• v-erb'A oncogene protein 19 , translation 
product of the exon-like region of ORF, the 
human glucocorticoid receptor (hGR) 20 and the 
oestrogen receptor (ER) 2 '. The limits of the 
exon-like region of ORF, defined by A-L2 and 
D-Ll boundaries, are indicated by rectangles. 
To predict the location of exon-like regions, we 
used the discriminating program predictor 1 . 
Two subsets of the GenBank data library, con- 
taining either only exon or only intron sequen- 
ces, were taken as reference pool. The program 
PROBE3-EXPLORJ (ref. 18), allowing the search 
for ambiguous nucleic or peptidic patterns, was 
used to screen both the NBRF (proteins) and 
the GenBank (DNA) data banks. These pro- 
grams were run on a MV8000 32-bit minicom- 
puter Amind-acid residues 69-117 from the 
ORF were aligned with amino-acid residues 
8-58 from p75 8aBHefb " A , residues 397-442 from 
hGR and residues 115-206 from ER. Identical residues are boxed and gaps are indicated by dashes. The HBV integration site, upstream from 
the cysteine-rich region, is indicated. 
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and v-erb-A gene products and which could exert a transcrip- 
tional regulatory function on specific genes. 

AJthough the way in which HBV participates in the formation 
of a liver cancer is unknown, the experiments reported here 
could promote our understanding of one possible mechanism 
of HBV carcinogenesis. In patient D the viral integration, inter- 
rupting the exon-like region (Fig. 2), created a chimaeric viral- 
host open reading frame (Fig. 4). The HBV insertion took place 
a fv ' * : - ' • r-:-eam from the beginning of the putrtt-v- 
DNA-binding domain. Since a viral promoter has been defined 
by in vitro transcription approximately 30 nucleotides upstream 
from the initiator codon of the pre-Sl gene 27 , we suggest that, 
in the tumorous part of the liver, a readihrough transcription 
occurred from the viral promoter. Although protein or RNA 
from the tumour is no longer available to test this hypothesis, 
it is most probable that inappropriate activation of the putative 
gene as a consequence of H BV integration resulted in expression 
of a truncated protein at greater levels than that of the native 
protein. This protein could participate directly in the subsequent 
cell transformation. 

Several arguments suggest that hormonal factors are involved 
in human hepatocarcinogenesis. The incidence of HCC is three- 
to sixfold greater in males*, and the use of oral contraceptives 
in females is associated with the development of hepatic 
adenoma 28 . Moreover, the ability of oestrogenic hormones to 
function as promoters of neoplastic development in rat liver has 
been clearly demonstrated 29 . The finding that HBV sequences 
have become integrated into a putative cellular gene sharing 
homology with the steroid receptor genes is therefore intriguing 
and suggests that, in some cases, hormonal and HBV car- 
cinogenesis may be directly related. 
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Fig. 4 DNA sequences at the HBV integration site. The sequence, 
of the left and right host-viral DNA junctions at (he occupied site 

^-.ppc. acvjLCi.wc; . j u .luman DNA 5cq-;r;^: 

at (he unoccupied site (lower sequence). The bold-face letter* 
indicate the viral sequence. Nucleotides of the ORF and the HBV 
genome 33 are numbered. Homologous nucleotides between the two 
sequences are indicated by sloping lines. The 7-bp CACTTCC 
present in the normal allele and deleted after the viral integration 
in the occupied site is boxed. Because HBV DNA and cellular 
DNA shared a 2-bp and 3-bp sequence homology at a point 
coincident, respectively* with the left and right host-viral junctions 
(dashed lines), (he deleted fragment could be up to 12 bp long. 
The DR2 copy of the 1 1 -bp viral direct repeat specifically involved 
in HBV integration 8 is indicated. The putative chimaeric proton, 
generated by the HBV inversion, between the first 29 amino acids 
of the viral pre-Sl gene product and the last 28 amino acids of the 
cellular exon protein product is partially represented at the fusion 
site. 
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